Cleaning up the record on the maximal information coefficient and equitability.

نویسندگان

  • David N Reshef
  • Yakir A Reshef
  • Michael Mitzenmacher
  • Pardis C Sabeti
چکیده

Although we appreciate Kinney and Atwal’s interest in equitability and maximal information coefficient (MIC), we believe they misrepresent our work. We highlight a few of our main objections below. Regarding our original paper (1), Kinney and Atwal (2) state “MIC is said to satisfy not just the heuristic notion of equitability, but also the mathematical criterion of R equitability,” the latter being their formalization of the heuristic notion that we introduced. This statement is simply false. We were explicit in our paper that our claims regarding MIC’s performance were based on large-scale simulations: “We tested MIC’s equitability through simulations. . ..[These] show that, for a large collection of test functions with varied sample sizes, noise levels, and noise models, MIC roughly equals the coefficient of determination R relative to each respective noiseless function.” Although we mathematically proved several things about MIC, none of our claims imply that it satisfies Kinney and Atwal’s R equitability, which would require that MIC exactly equal R in the infinite data limit. Thus, their proof that no dependence measure can satisfy R equitability, although interesting, does not uncover any error in our work, and their suggestion that it does is a gross misrepresentation. Kinney and Atwal seem ready to toss out equitability as a useful criterion based on their theoretical result. We argue, however, that regardless of whether “perfect” equitability is possible, approximate notions of equitability remain the right goal for many data exploration settings. Just as the theory of NP completeness does not suggest we stop thinking about NP complete problems, but instead that we look for approximations and solutions in restricted cases, an impossibility result about perfect equitability provides focus for further research, but does not mean that useful solutions are unattainable. Similarly, as others have noted (3), Kinney and Atwal’s proof requires a highly permissive noise model, and so the attainability of R equitability under more limited noise models such as those in our work remains an open question. Finally, the authors argue that mutual information is more equitable than MIC. However, they provide as justification only a single noise model, only at limiting sample sizes ðn≥ 5;000Þ. As we’ve shown in followup work (4), which they themselves cite but fail to address, MIC is more equitable than mutual information estimation under many other realistic noise models even at a sample size of 5,000. Kinney and Atwal have stated, “. . .it matters how one defines noise” (5), and a useful statistic must indeed be robust to a wide range of noise models. Equally importantly, we’ve established in both our original and follow-up work that at sample size regimes less than 5,000, MIC is more equitable than mutual information estimates across all noise models tested. MIC’s superior equitability in these settings is not an “artifact” we neglected—as Kinney and Atwal suggest—but rather a weakness of mutual information estimation and an important consideration for practitioners. We expect that the understanding of equitability and MIC will improve over time and that better methods may arise. However, accurate representations of the work thus far will allow researchers in the area to most productively and collectively move forward.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Equitability and MIC: an FAQ

The original paper on equitability and the maximal information coefficient (MIC) [Reshef et al., 2011] has generated much discussion and interest, and so far MIC has enjoyed use in a variety of disciplines. This document serves to provide some basic background and understanding of MIC as well as to address some of the questions raised about MIC in the literature, and to provide pointers to rele...

متن کامل

Theoretical Foundations of Equitability and the Maximal Information Coefficient

The maximal information coefficient (MIC) is a tool for finding the strongest pairwise relationships in a data set with many variables [1]. MIC is useful because it gives similar scores to equally noisy relationships of different types. This property, called equitability, is important for analyzing high-dimensional data sets. Here we formalize the theory behind both equitability and MIC in the ...

متن کامل

Equitability Analysis of the Maximal Information Coefficient, with Comparisons

A measure of dependence is said to be equitable if it gives similar scores to equally noisy relationships of different types. Equitability is important in data exploration when the goal is to identify a relatively small set of strongest associations within a dataset as opposed to finding as many non-zero associations as possible, which often are too many to sift through. Thus an equitable stati...

متن کامل

Reply to Murrell et al.: Noise matters.

The concept of statistical " equitability " plays a central role in the 2011 paper by Reshef et al. (1). Formalizing equitability first requires formalizing the notion of a " noisy functional relationship, " that is, a relationship between two real variables, X and Y, having the form Y = f ðXÞ + η; where f is a function and η is a noise term. Whether a dependence measure satisfies equi-tability...

متن کامل

Equitability, mutual information, and the maximal information coefficient.

How should one quantify the strength of association between two random variables without bias for relationships of a specific form? Despite its conceptual simplicity, this notion of statistical "equitability" has yet to receive a definitive mathematical formalization. Here we argue that equitability is properly formalized by a self-consistency condition closely related to Data Processing Inequa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Proceedings of the National Academy of Sciences of the United States of America

دوره 111 33  شماره 

صفحات  -

تاریخ انتشار 2014